Clustering by Local Skewering
نویسنده
چکیده
Clustering p-dimensional data by fitting a mixture of K normals has enjoyed renewed interest (for example, see Splus function “mclust”). However, the number of parameters for the model grows rapidly with dimension p. For example, even if all the covariance matrices are assumed to be equal, the number of parameters is (K − 1)+K ∗ p+ p(p+1)/2 for the weights, means and covariance matrix. At ACAS in 2001, Scott introduced the partial mixture component algorithm which fits only one component of the mixture model at a time. This algorithm requires only 1 + p+ p ∗ (p+ 1)/2 parameters for the weight, mean vector, and covariance matrix. In this talk, we introduce a new algorithm which attempts to find the “best” line through individual clusters. This model requires only 2 ∗ p− 1 parameters. That is, the new algorithm is linear rather than quadratic in p. By repeatedly reinitializing the search algorithm, all clusters may be identified. Intuitively, the line found is approximately the largest eigenvector of the local covariance matrix. The GGobi visualization program will be used to illustrate the success of this algorithm on real and simulated data.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملGenerating Optimal Timetabling for Lecturers using Hybrid Fuzzy and Clustering Algorithms
UCTTP is a NP-hard problem, which must be performed for each semester frequently. The major technique in the presented approach would be analyzing data to resolve uncertainties of lecturers’ preferences and constraints within a department in order to obtain a ranking for each lecturer based on their requirements within a department where it is attempted to increase their satisfaction and develo...
متن کاملA Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS
Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...
متن کاملحاشیهنویسی تصویر با استفاده از الگوریتم خوشهبندی نیمه نظارتی طیفی
Abstract: Due to the growth of digital images require efficient methods to annotate the images is sense. In this paper, a semi-supervised spectral clustering with relevance feedback is used to annotate digital photos which is overcome the local minima problem on clustering methods by using some labeled information given by users. Performance of the proposed method is tested on Corel 5K dataset ...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کامل